Extracting information from a picture, round 1
This week, I wanted to get information I found on the nice map, below. I could not get access to the original dataset, per zip code… and I was wondering, if (assuming that the map was with high...
View ArticleExtracting information from a picture, round 2
Yesterday, I published a post on extracting information from a picture, but it did not work as expected. I claimed that it was because of the original graph I had. More precisely, the was based on some...
View ArticleExotic link functions for GLMs
In my previous post on GLMs, I discussed power link functions. But there are much more links that can be used : The square root link (for the Poisson model) Consider some random variable Y with mean...
View ArticleEstimates on training vs. validation samples
Before moving to cross-validation, it was natural to say “I will burn 50% (say) of my data to train a model, and then use the remaining to fit the model”. For instance, we can use training data for...
View ArticleOptimal transport on large networks
With Alfred Galichon and Lucas Vernet, we recently uploaded a paper entitled optimal transport on large networks on arxiv. This article presents a set of tools for the modeling of a spatial allocation...
View ArticleInsurance data science : use and value of unusual data #1
Next week, with , I will be at the Summer School of the Swiss Association of Actuaries, in Lausanne, with Jean-Philippe Boucher (UQAM) and Ewen Gallic (AMSE). I will give an introductionary talk on...
View ArticleInsurance data science : Pictures
At the Summer School of the Swiss Association of Actuaries, in Lausanne, following the part of Jean-Philippe Boucher (UQAM) on telematic data, I will start talking about pictures this Wednesday....
View ArticleInsurance data science : Text
At the Summer School of the Swiss Association of Actuaries, in Lausanne, I will start talking about text based data and NLP this Thursday. Slides are available online Ewen Gallic (AMSE) will present a...
View ArticleInsurance data science : Networks
At the Summer School of the Swiss Association of Actuaries, in Lausanne, I will start talking about networks and insurance this Friday. Slides are available online
View ArticleOn leverage
Last week, in our STT5100 (applied linear models) class, I’ve introduce the hat matrix, and the notion of leverage. In a classical regression model, \boldsymbol{y}=\boldsymbol{X}\boldsymbol{\beta} (in...
View ArticleCombining automatically factor levels with trees
Last year, in a post, I discussed how to merge levels of factor variables, using combinatorial techniques (it was for my STT5100 cours, and trees are not in the syllabus), with an extension on trees at...
View ArticleOn the conjugate function
In the MAT7381 course (graduate course on regression models), we will talk about optimization, and a classical tool is the so-called conjugate. Given a function f:\mathbb{R}^p\to\mathbb{R} its...
View ArticleOn Cochran Theorem (and Orthogonal Projections)
Cochran Theorem – from The distribution of quadratic forms in a normal system, with applications to the analysis of covariance published in 1934 – is probably the most import one in a regression...
View ArticleQuantile Regression (home made, part 2)
A few months ago, I posted a note with some home made codes for quantile regression… there was something odd on the output, but it was because there was a (small) mathematical problem in my equation....
View ArticleLasso Regression (home made)
Again, this post is related to my MAT7381 course, where we will see that it is actually possible to write our own code to compute Lasso regression,...
View ArticleTesting for a causal effect (with 2 time series)
A few days ago, I came back on a sentence I found (in a French newspaper), where someone was claiming that “… an old variable explains 85% of the change in a new variable. So we can talk about...
View ArticleFunction basis and regression
In the first part of the course on linear models, we’ve seen how to construct a linear model when the vector of covariates \boldsymbol{x} is given, so that \mathbb{E}(Y|\boldsymbol{X}=\boldsymbol{x})...
View ArticleTesting for Covid-19 in the U.S.
For almost a month, on a daily basis, we are working with colleagues (Romuald, Chi and Mathieu) on modeling the dynamics of the recent pandemic. I learn of lot of things discussing with them, but we...
View ArticleRegression discontinuity model for TV series
In September, we are usually happy to see our favorite TV series back on air… Or not? Because, admit it, if we are happy to see those characters back, most of the time, we are disappointed, too. So why...
View ArticleSharing pictures from holidays in the Canadian Rockies (with R)
My kids have a very popular blog (at least among their grandmothers) where they frequently post pictures from everyday’s life (since they live 5000km from them), as well as pictures taken from...
View ArticleHidding values in the output of the summary function for a (linear) regression
Since our Fall 2020 session will be 100% online (and off-site), I have to work hard this summer to prepare online quizz and exams. I started intensively to play with Achim’s awesome r-exams package....
View ArticleR0 and the exponential growth of a pandemic
For some dissemination work, I want to create a nice graph to explain the exponential growth in pandemics, related to the value of R_0. Recall that R_0 corresponds to the average number of people that...
View ArticleR0 and the exponential growth of a pandemic, an update
A few days ago, I wrote a blog post – R0 and the exponential growth of a pandemic – where I was trying to generate some visualization of some exponential growth, in the context of a pandemic. After...
View ArticleTrees and forests
For my ACT6100 weekly quiz, I usually generate some datasets, and then ask students to compare various predictive algorithms. Last week, it was about classification trees and random forests. And...
View ArticleInsurance Pricing Game
Would you like to put your data science skills to the test? Imperial College London, Universite du Quebec à Montreal (UQAM), and actuarial institutes in Singapore, the UK, including the IFoA, and...
View ArticleLilliefors, Kolmogorov-Smirnov and cross-validation
In statistics, Kolmogorov–Smirnov test is a popular procedure to test, from a sample \{x_1,\cdots,x_n\} is drawn from a distribution F, or usually F_{\theta_0}, where F_{\theta} is some parametric...
View ArticleSome general thoughts on Partial Dependence Plots with correlated covariates
The partial dependence plot is a nice tool to analyse the impact of some explanatory variables when using nonlinear models, such as a random forest, or some gradient boosting.The idea (in dimension 2),...
View ArticleFrom multinomial regression to binary classification on some Siamese data
There are two kinds of people in the world: people who think there are two kinds of people in the world and people who don’t (borrowed from Menand (2018)). Because things are always simpler when we...
View ArticleCould there be incentives to cycle through a red light?
This is of course a rhetorical question! Because cyclists must stop when the light is red! … But … there is always that moment, on a bicycle, when you stop, and then you say to yourself the worst part...
View ArticleSnow in Montréal (Canada)
Winter started a bit more than one month ago… but we have already experienced many snow storms… there is still a lot snow in gardens and in the streets, I was wondering if it was that unusual, but...
View Article
More Pages to Explore .....