F#: A day of writing a little twitter application
I spent most of the bank holiday Monday here in Sydney writing a little application to scan through my twitter feed and find me just the tweets which have links in them since for me that’s where a lot of the value of twitter lies.
I’m sure someone has done this already but it seemed like a good opportunity to try and put a little of the F# that I’ve learned from reading Real World Functional Programming to use. The code I’ve written so far is at the end of this post.
What did I learn?
-
I didn’t really want to write a wrapper on top of the twitter API so I put out a request for suggestions for a .NET twitter API. It pretty much seemed to be a choice of either Yedda or tweetsharp and since the latter seemed easier to use I went with that. In the code you see at the end I have added the 'Before' method to the API because I needed it for what I wanted to do.
-
I found it really difficult writing the 'findLinks' method - the way I’ve written it at the moment uses pattern matching and recursion which isn’t something I’ve spent a lot of time doing. Whenever I tried to think how to solve the problem my mind just wouldn’t move away from the procedural approach of going down the collection, setting a flag depending on whether we had a 'lastId' or not and so on. Eventually I explained the problem to Alex and working together through it we realised that there are three paths that the code can take:
-
When we have processed all the tweets and want to exit
-
The first call to get tweets when we don’t have a 'lastId' starting point - I was able to get 20 tweets at a time through the API
-
Subsequent calls to get tweets when we have a 'lastId' from which we want to work backwards from
I think it is probably possible to reduce the code in this function to follow just one path by passing in the function to find the tweets but I haven’t been able to get this working yet.
-
-
I recently watched a F# video from Alt.NET Seattle featuring Amanda Laucher where she spoke of the need to explicitly state types that we import from C# into our F# code. You can see that I needed to do that in my code when referencing the TwitterStatus class - I guess it would be pretty difficult for the use of that class to be inferred but it still made the code a bit more clunky than any of the other simple problems I’ve played with before.
-
I’ve not used any of the functions on 'Seq' until today - from what I understand these are available for applying operations to any collections which implement IEnumerable - which is exactly what I had!
-
I had to use the following code to allow F# interactive to recognise the Dimebrain namespace: ~text #r "\path\to\Dimebrain.Tweetsharp.dll" ~ I thought it would be enough to reference it in my Visual Studio project and reference the namespace but apparently not. </ul>
== The code
This is the code I have at the moment - there are certainly some areas that it can be improved but I’m not exactly sure how to do it. In particular: What’s the best way to structure F# code? I haven’t seen any resources on how to do this so it’d be cool if someone could point me in the right direction. The code I’ve written is just a collection of functions which doesn’t really have any structure at all. Reducing duplication - I hate the fact I’ve basically got the same code twice in the 'getStatusesBefore' and 'getLatestStatuses' functions - I wasn’t sure of the best way to refactor that. Maybe putting the common code up to the 'OnFriendsTimeline' call into a common function and then call that from the other two functions? I think a similar approach can be applied to findLinks as well. The code doesn’t feel that expressive to me - I was debating whether or not I should have passed a type into the 'findLinks' function - right now it’s only possible to tell what each part of the tuple means by reading the pattern matching code which feels wrong. I think there may also be some opportunities to use the function composition operator but I couldn’t quite see where. How much context should we put in the names of functions? Most of my programming has been in OO languages where whenever we have a method its context is defined by the object on which it resides. When naming functions such as 'findOldestStatus' and 'oldestStatusId' I wasn’t sure whether or not I was putting too much context into the function name. I took the alternative approach with the 'withLinks' function since I think it reads more clearly like that when it’s actually used.
// Import required namespaces open Dimebrain.TweetSharp.Fluent open Dimebrain.TweetSharp.Extensions open Dimebrain.TweetSharp.Model open Microsoft.FSharp.Core.Operators // Define a function to get statuses before a given status ID let getStatusesBefore (statusId: int64) = FluentTwitter.CreateRequest() .AuthenticateAs("userName", "password") .Statuses() .OnFriendsTimeline() .Before(statusId) .AsJson() .Request() .AsStatuses() // Define a function to filter statuses containing links let withLinks (statuses: seq<TwitterStatus>) = statuses |> Seq.filter (fun eachStatus -> eachStatus.Text.Contains("http")) // Define a function to print statuses let print (statuses: seq<TwitterStatus>) = for status in statuses do printfn "[%s] %s" status.User.ScreenName status.Text // Define a function to get the latest statuses let getLatestStatuses = FluentTwitter.CreateRequest() .AuthenticateAs("userName", "password") .Statuses() .OnFriendsTimeline() .AsJson() .Request() .AsStatuses() // Define a function to find the oldest status let findOldestStatus (statuses: seq<TwitterStatus>) = statuses |> Seq.sort_by (fun eachStatus -> eachStatus.Id) |> Seq.head // Retrieve the oldest status ID from the latest statuses let oldestStatusId = (getLatestStatuses |> findOldestStatus).Id // Define a recursive function to find statuses with links let rec findLinks (args: int64 * int * int) = match args with | (_, numberProcessed, recordsToSearch) when numberProcessed >= recordsToSearch -> ignore | (0L, numberProcessed, recordsToSearch) -> let latestStatuses = getLatestStatuses (latestStatuses |> withLinks) |> print findLinks(findOldestStatus(latestStatuses).Id, numberProcessed + 20, recordsToSearch) | (lastId, numberProcessed, recordsToSearch) -> let latestStatuses = getStatusesBefore lastId (latestStatuses |> withLinks) |> print findLinks(findOldestStatus(latestStatuses).Id, numberProcessed + 20, recordsToSearch) // Define a function to initiate the search for statuses with links let findStatusesWithLinks recordsToSearch = findLinks(0L, 0, recordsToSearch) |> ignore
And to use it to find the links contained in the most recent 100 statuses of the people I follow:
findStatusesWithLinks 100;;
Any advice on how to improve this will be gratefully received. I’m going to continue working this into a little DSL which can print me up a nice summary of the links that have been posted during the times that I’m not on twitter watching what’s going on.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.